This analysis is concerned with identifying differentially expressed genes across disease states. Primarily we are interested in whether there is a particular gene expression signature that is associated with PSC/UC as opposed to UC alone. There are some complicating factors in the analysis in that a small subset of samples are mildly inflamed - this could affect differential expression results. Nevertheless, I have presiously shown that there are no statistically significant differences betwen PSC/UC and UC in terms of Nancy scores at each of the tissue sites sampled.
I have decided to perform differential expression analysis at each site separately and then compare the results. Initially I am using the LRT test in DESeq2 to define differentially expressed genes between healthy, PSC/UC and UC. The idea is that we could do something similar to the between tissue analyses if the differences in disease groups are not too subtle. First I will perform some further exploratory analysis.
I will base some of the filtering steps on the number of samples that I have in each tissue/disease group. Below is a summary of these numbers. The filtering is the same as was performed for defining differentially expressed genes between tissues. This allows a more straightforward comparison between analyses.
| HEALTHY | PSC/UC | UC | |
|---|---|---|---|
| Caecum | 9 | 7 | 10 |
| Ileum | 11 | 7 | 9 |
| Rectum | 12 | 7 | 10 |
Significance of metadata variables associated with group status is explored here. I feel like these are better as a plot rather than a table but could be wrong. Plot age, sex, nancy.score.at.site, UCEIS,
## Sex:
## Age:
## nancy:GI#2876-HEALTHY
## nancy:GI#2902-HEALTHY
## nancy:GI#2907-HEALTHY
## nancy:GI#2911-HEALTHY
## nancy:GI#3180-HEALTHY
## nancy:GI#3189-HEALTHY
## nancy:GI#3352-HEALTHY
## nancy:GI#3859-HEALTHY
## nancy:GI#3938-HEALTHY
## nancy:GI#3951-HEALTHY
## nancy:GI#4134-HEALTHY
## nancy:GI#4221-HEALTHY
## UCEIS:GI#2876-HEALTHY
## UCEIS:GI#2902-HEALTHY
## UCEIS:GI#2907-HEALTHY
## UCEIS:GI#2911-HEALTHY
## UCEIS:GI#3180-HEALTHY
## UCEIS:GI#3189-HEALTHY
## UCEIS:GI#3352-HEALTHY
## UCEIS:GI#3859-HEALTHY
## UCEIS:GI#3938-HEALTHY
## UCEIS:GI#3951-HEALTHY
## UCEIS:GI#4134-HEALTHY
## UCEIS:GI#4221-HEALTHY
## Urso:
## vedolizumab:
## AZA.6MP.ISS:
## X5.ASA:
## steroid:
This is an initial look at principle components using log2cpm. I have already had a look at this as an exploratory analysis and removed patients that were outliers i.e. their expression profiles were inconsistent with tissue patterns and may have been the result of a sample swap. Also removed was a patient that had polyps.
Here I look explicitly at whether there is a significant association between overall expression patterns and tissue site.
| Df | SumsOfSqs | MeanSqs | F.Model | R2 | Pr(>F) | |
|---|---|---|---|---|---|---|
| Tissue.location | 2 | 0.0997791 | 0.0498896 | 25.51449 | 0.3924431 | 0.001 |
| Residuals | 79 | 0.1544721 | 0.0019553 | NA | 0.6075569 | NA |
| Total | 81 | 0.2542512 | NA | NA | 1.0000000 | NA |
Here I look at the principle components analysis at each tissue location separately. This will help to detect if there are any differences by disease status that are somewhat masked by the huge amount of variation explained by tissue location.
There is not a clear clustering by group in terms of overall gene expression patterns. Nevertheless we can test for association using a PERMANOVA.
Below I test whether there is a difference in overall expression profiles by disease status. This is done in each tissue separately.
| Df | SumsOfSqs | MeanSqs | F.Model | R2 | Pr(>F) | |
|---|---|---|---|---|---|---|
| Disease | 2 | 0.0059934 | 0.0029967 | 1.614911 | 0.1186134 | 0.051 |
| Residuals | 24 | 0.0445357 | 0.0018557 | NA | 0.8813866 | NA |
| Total | 26 | 0.0505292 | NA | NA | 1.0000000 | NA |
| Df | SumsOfSqs | MeanSqs | F.Model | R2 | Pr(>F) | |
|---|---|---|---|---|---|---|
| Disease | 2 | 0.0032406 | 0.0016203 | 1.061678 | 0.0845172 | 0.36 |
| Residuals | 23 | 0.0351015 | 0.0015262 | NA | 0.9154828 | NA |
| Total | 25 | 0.0383421 | NA | NA | 1.0000000 | NA |
| Df | SumsOfSqs | MeanSqs | F.Model | R2 | Pr(>F) | |
|---|---|---|---|---|---|---|
| Disease | 2 | 0.0041217 | 0.0020609 | 0.8715505 | 0.0628301 | 0.653 |
| Residuals | 26 | 0.0614791 | 0.0023646 | NA | 0.9371699 | NA |
| Total | 28 | 0.0656008 | NA | NA | 1.0000000 | NA |
There look to be very subtle overall differences in overall gene expression profiles in terms of disease status. The strongest effects are in the ileum as noted by the highest R^2.
Below is the results of differential expression analysis using DESeq2. No covariates are taken into account (numbers are small). Initially I perform an overall test (LRT) to see which genes are differentially expressed between any group.
| nsig.caecum | nsig.ileum | nsig.rectum |
|---|---|---|
| 17 | 111 | 27 |
## png
## 2
Below is a heatmap of all of the observed changes (union at each tissue site). This is to try to get a handle on whether there are any patterns there. I am also including the metadata to see if there are any obvious associations with factors such as inflammation.
Differential expression analysis
Differential expression analysis
Differential expression analysis
In the ileum, the PSC/UC and UC look fairly similar. The caecum looks to be more affected in PSC and the rectum more in UC. These patterns are more or less what we would have expected. As opposed to the tissue-defining clusters that we found there is no obvious clustering here and so pairwise comparisons are potentially going to identify more robust differences.
Below I run the analysis looking at each tissue separately and each pairwise comparison. The table summarises the results of these comparisons. We do not have enough samples to assess the impact of inflammation - in fact the vast majority of samples are not inflamed.
Below are tables that summarise the differentially expressed genes between each disease group at each tissue site.
| Var1 | Freq |
|---|---|
| Down | 45 |
| Up | 151 |
| Var1 | Freq |
|---|---|
| Down | 302 |
| Up | 427 |
| Var1 | Freq |
|---|---|
| Down | 1 |
| Up | 2 |
| Var1 | Freq |
|---|---|
| Down | 174 |
| Up | 46 |
| Var1 | Freq |
|---|---|
| Down | 2 |
| Up | 1 |
| Var1 | Freq |
|---|---|
| Down | 14 |
| Up | 2 |
| Var1 | Freq |
|---|---|
| Up | 1 |
| Var1 | Freq |
|---|---|
| Down | 7 |
| Up | 30 |
| Freq |
|---|
Below I draw a heatmap to show the union of significant differences between any pairwise comparison. I also output a venn diagram to show the overlaps between the tissue sites.
## png
## 2
In the following analysis I look at correlating fold changes between the two pairwise comparisons (i.e. each disease vs. healthy). This will reveal differences between the two disease conditions. It is exprected that if they are truly distinct in transcriptomic signatures that they will not be highly correlated.
| ileum.cor | ileum.p | caecum.cor | caecum.p | rectum.cor | rectum.p | |
|---|---|---|---|---|---|---|
| cor | 0.6748169 | 0 | 0.5055982 | 0 | 0.420959 | 0 |
## png
## 2
## png
## 2
## png
## 2
These are certainly significantly correlated. The ileum shows some interesting patterns - especially for a handful of genes that look as if they are downregulated in UC but up-regulate in PSC. Should see what these are.
Very few of our samples have histological eveidence of inflammation. Nevertheless we are interested in whether there are any molecular (i.e. gene expression) markers of inflammation that might hint at any subclinical inflammation that might be present in the differnetial expression results. To test this I look at a few marker genes that are known to be robustly associated with ulcerative colitis namely, TNF, IL17A, IL12A, IL23A, NOS2 and S100A8. To look at this, I take the union of significantly differentially expressed genes from PSC/UC vs. Healthy and UC vs. Healthy and see whether these genes are differentially expressed.
It’s kind of interesting because the inflammatory markers that I have chosen are only increased in the ileum - this is supposed to be a relatively uninvolved site. There may be some molecular inflammation that does not manifest as overt inflammation. By eye there are no obvious associations with any of the measured covariates like drugs or histology (Nancy here is the Nancy index at the tissue site sampled).
I initially had a hypothesis that the reason for not seeing very strong overlap in disease associations between tissue sites was because tissue-defining gene expression signatures were the things that were altered in disease states. Here I explore this hypothesis by looking at the enrichment of previously defined tissue clusters amongst genes that are significantly differentially expressed in either PSC/UC or UC.
| code | scount | stotal | spercent | bcount | btotal | bpercent | ratio | pvalue | pover | punder | goid | category | description | fdr | Type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| - | 33 | 131 | 25.19 | 4148 | 12995 | 31.92 | 0.79 | 0.0564600 | 0.9626300 | 0.0564600 | Cluster 1 | Tissue.clusters | Cluster 1 | 0.070575 | Ileum PSC vs. Healthy |
| + | 41 | 131 | 31.30 | 3081 | 12995 | 23.71 | 1.32 | 0.0282480 | 0.0282480 | 0.9822000 | Cluster 2 | Tissue.clusters | Cluster 2 | 0.047080 | Ileum PSC vs. Healthy |
| - | 28 | 131 | 21.37 | 2849 | 12995 | 21.92 | 0.97 | 0.4891600 | 0.5950200 | 0.4891600 | Cluster 3 | Tissue.clusters | Cluster 3 | 0.489163 | Ileum PSC vs. Healthy |
| + | 25 | 131 | 19.08 | 1614 | 12995 | 12.42 | 1.54 | 0.0182420 | 0.0182420 | 0.9900300 | Cluster 4 | Tissue.clusters | Cluster 4 | 0.045605 | Ileum PSC vs. Healthy |
| - | 4 | 131 | 3.05 | 1364 | 12995 | 10.50 | 0.29 | 0.0013872 | 0.9996600 | 0.0013872 | Cluster 5 | Tissue.clusters | Cluster 5 | 0.006936 | Ileum PSC vs. Healthy |
| - | 29 | 169 | 17.16 | 4148 | 12995 | 31.92 | 0.54 | 0.0000097 | 1.0000000 | 0.0000097 | Cluster 1 | Tissue.clusters | Cluster 1 | 0.000012 | Caecum PSC vs. Healthy |
| - | 10 | 169 | 5.92 | 3081 | 12995 | 23.71 | 0.25 | 0.0000000 | 1.0000000 | 0.0000000 | Cluster 2 | Tissue.clusters | Cluster 2 | 0.000000 | Caecum PSC vs. Healthy |
| - | 15 | 169 | 8.88 | 2849 | 12995 | 21.92 | 0.40 | 0.0000052 | 1.0000000 | 0.0000052 | Cluster 3 | Tissue.clusters | Cluster 3 | 0.000009 | Caecum PSC vs. Healthy |
| + | 104 | 169 | 61.54 | 1614 | 12995 | 12.42 | 4.95 | 0.0000000 | 0.0000000 | 1.0000000 | Cluster 4 | Tissue.clusters | Cluster 4 | 0.000000 | Caecum PSC vs. Healthy |
| - | 11 | 169 | 6.51 | 1364 | 12995 | 10.50 | 0.62 | 0.0509010 | 0.9729300 | 0.0509010 | Cluster 5 | Tissue.clusters | Cluster 5 | 0.050901 | Caecum PSC vs. Healthy |
| + | 1 | 1 | 100.00 | 4148 | 12995 | 31.92 | 3.13 | 0.3192000 | 0.3192000 | 1.0000000 | Cluster 1 | Tissue.clusters | Cluster 1 | 0.895037 | Rectum PSC vs. Healthy |
| - | 0 | 1 | 0.00 | 3081 | 12995 | 23.71 | 0.00 | 0.7629100 | 1.0000000 | 0.7629100 | Cluster 2 | Tissue.clusters | Cluster 2 | 0.895037 | Rectum PSC vs. Healthy |
| - | 0 | 1 | 0.00 | 2849 | 12995 | 21.92 | 0.00 | 0.7807600 | 1.0000000 | 0.7807600 | Cluster 3 | Tissue.clusters | Cluster 3 | 0.895037 | Rectum PSC vs. Healthy |
| - | 0 | 1 | 0.00 | 1614 | 12995 | 12.42 | 0.00 | 0.8758000 | 1.0000000 | 0.8758000 | Cluster 4 | Tissue.clusters | Cluster 4 | 0.895037 | Rectum PSC vs. Healthy |
| - | 0 | 1 | 0.00 | 1364 | 12995 | 10.50 | 0.00 | 0.8950400 | 1.0000000 | 0.8950400 | Cluster 5 | Tissue.clusters | Cluster 5 | 0.895037 | Rectum PSC vs. Healthy |
| - | 104 | 454 | 22.91 | 4148 | 12995 | 31.92 | 0.72 | 0.0000104 | 0.9999900 | 0.0000104 | Cluster 1 | Tissue.clusters | Cluster 1 | 0.000026 | Ileum UC vs. Healthy |
| + | 184 | 454 | 40.53 | 3081 | 12995 | 23.71 | 1.71 | 0.0000000 | 0.0000000 | 1.0000000 | Cluster 2 | Tissue.clusters | Cluster 2 | 0.000000 | Ileum UC vs. Healthy |
| - | 92 | 454 | 20.26 | 2849 | 12995 | 21.92 | 0.92 | 0.2093100 | 0.8228800 | 0.2093100 | Cluster 3 | Tissue.clusters | Cluster 3 | 0.242122 | Ileum UC vs. Healthy |
| - | 51 | 454 | 11.23 | 1614 | 12995 | 12.42 | 0.90 | 0.2421200 | 0.8018800 | 0.2421200 | Cluster 4 | Tissue.clusters | Cluster 4 | 0.242122 | Ileum UC vs. Healthy |
| - | 26 | 454 | 5.73 | 1364 | 12995 | 10.50 | 0.55 | 0.0002020 | 0.9999000 | 0.0002020 | Cluster 5 | Tissue.clusters | Cluster 5 | 0.000337 | Ileum UC vs. Healthy |
| - | 4 | 20 | 20.00 | 4148 | 12995 | 31.92 | 0.63 | 0.1845300 | 0.9225500 | 0.1845300 | Cluster 1 | Tissue.clusters | Cluster 1 | 0.184530 | Rectum UC vs. Healthy |
| - | 0 | 20 | 0.00 | 3081 | 12995 | 23.71 | 0.00 | 0.0044410 | 1.0000000 | 0.0044410 | Cluster 2 | Tissue.clusters | Cluster 2 | 0.011102 | Rectum UC vs. Healthy |
| + | 8 | 20 | 40.00 | 2849 | 12995 | 21.92 | 1.82 | 0.0525450 | 0.0525450 | 0.9819000 | Cluster 3 | Tissue.clusters | Cluster 3 | 0.087575 | Rectum UC vs. Healthy |
| + | 8 | 20 | 40.00 | 1614 | 12995 | 12.42 | 3.22 | 0.0017596 | 0.0017596 | 0.9996800 | Cluster 4 | Tissue.clusters | Cluster 4 | 0.008798 | Rectum UC vs. Healthy |
| - | 0 | 20 | 0.00 | 1364 | 12995 | 10.50 | 0.00 | 0.1086600 | 1.0000000 | 0.1086600 | Cluster 5 | Tissue.clusters | Cluster 5 | 0.135825 | Rectum UC vs. Healthy |
It doesn’t look exactly how I was expecting. In fact, particularly in the ileum there is an enrichment of cluster 2 genes (proliferation cluster) amongst differentially expressed genes. Below I explore in a little more detail what these are and how they relate to tissue-defining clusters.
Interestingly it is not particularly genes that differentiate the ileum from other tissues that are altered in PSC/UC. Indeed, there is an enrichment of cluster 2 genes. Below I explore what these genes are.
Heatmap of everything. Will see how this looks in terms of adding into the manuscript.
There seems to be an increase in proliferative cells in PSC/UC ileum. I next investigate whether this is linked to B-cells i.e. those that are expressing antibody (BCR) which are cluster 4 genes. This would imply that there is an increase in proliferating B-cells in disease ileum vs. healthy.
Cluster 4 genes are enriched in PSC/UC associated genes. Here I explore what these are.
FOXP3 is pretty interesting especially in PSC. This is because it seems to be up in the ileum and down in the caecum. This may suggest that these two tissues are linked by Tregs. If this was the case then we would expect to see a negative correlation between FOXP3 expression in the ileum vs. the caecum.
There is no real correlation there so it is unlikely that the effects in one tissue are affecting the other.
It is of interest that FOXP3 and IL7R are both downregulated in PSC/UC. The relationship between these two genes is somewhat hard to interpret - IL7 is important for Treg development in the thymus but it seems that in inflammation IL7R expression on a subset of T-cells (not Tregs) represents a pathogenic class of T cells. There is also evidence that activated Tregs express high levels of IL7R. Below I look at the correlation beween IL7R and FOXP3 expression within patients with PSC/UC. If co-expressed it would suggest that these genes are expressed from the same cell types.
These data are interesting as there is a definite positive correlation between FOXP3 and IL7R. I did a quick check for whether there was a significant difference between correlation coefficients PSC/UC and UC and there isn’t (data not shown here). It is likely that the numbers are so small that we don’t get a robust correlation. Therefore I would like to use more data - to this end I am going to try to remove the effect of disease from the expression measurements and do the correlations on all of the data at the same time. This should remove spurous correlation that could occur due to differences between groups but retain power to get a robust correlation. I do this using the limma function removeBatchEffect().
This appears to represent the average correlation which is what would be expected. The increased numbers means that the correlation is significant - it comes down to what to present in the manuscript. There is a balance between something that could end up being confusing due to removal of batch effects or confusing due to differences in correlations between groups. I think I will go for the former as I think it best represents the data.
Given the correlation between IL7R and FOXP3 and evidence that activated Tregs express high levels of IL7R, I thought it would be worth looking at the correlation between these genes and the activation markers, ICOS, CTLA4 and CD103 (ITGAE).
I think that this is pretty convincing evidence that we are looking at activated T cells here. Reductions of FOXP3 and IL7R and likely due to reductions in activated T cells - indeed ICOS is slightly (although not significantly donwregulated in psc). I next explore how to present these data potentially in a single figure rather than multiple scatterplots.
```{all.cor.activation, echo=FALSE, message=FALSE, fig.height=5, fig.width=5}
cors.activation <- corr.test(df.compare.il7r.foxp3[,c(1:7)]) cors.activation.r <- cors.activation\(r cors.activation.p <- cors.activation\)p
cor.colors <- colorRampPalette(c(“white”, blues9[6]))(3) p.cors <- ggcorrplot(cors.activation.r, hc.order = TRUE, type = “upper”, lab=TRUE, colors = cor.colors, p.mat=cors.activation.p) p.cors ggsave(“export.dir/cors_activation.pdf”, plot=p.cors + theme(text=element_text(size=6)), height=3, width=3.5)
I'm not convinced that I like just presenting the correlations and so will see if I can fit the scatterplots in the manuscript.
#### Cluster 4 genes in Rectum UC
<img src="disease_differential_expression_files/figure-html/rectum.uc.cluster4.genes-1.png" width="1440" />
#### Cluster 4 genes in Rectum UC
<img src="disease_differential_expression_files/figure-html/rectum.uc.cluster3.genes-1.png" width="1440" />
### PSC-specific changes
Although globally PSC/UC and UC look quite similar i.e. correlated effect sizes, we did still call a handful of genes as differentially abundant between PSC/UC and UC. Below are the plots of genes called as significantly differentially expressed in each tissue site.
#### Ileum PSC/UC vs. UC
<img src="disease_differential_expression_files/figure-html/psc.specific.ileum-1.png" width="1440" />
#### Caecum PSC/UC vs. UC
<img src="disease_differential_expression_files/figure-html/psc.specific.caecum-1.png" width="1440" />
#### Rectum PSC/UC vs. UC
No genes were significantly differentially expressed between PSC/UC and UC in the rectum.
### GGT1 expression
Manually looking at these genes suggests that these are predominantly lowly expressed and only seen as different in a handful of patients, especially in the ileum. However, one gene particularly stands out and that is GGT1 - it is significantly lower in expression in both the caecum and the ileum.
This gene is potentially very interesting as it regulates glutathione scavenging. It is expressed at the cell membrane and transfers groups from glutahione to cysteine. This then allows uptake of the molecule into cells to be converted back into glutathione that is useful for mopping up reactive oxygen etc. This therefore leads to an hypothesis about its role in inflammation-driven cancer - in the presence of inflammation where there is an increase in reactive oxygen, PSC patients are less able to protect against reactive oxygen-driven DNA-damgage. This may then lead to an increase in the number of somatic, cancer-driving mutations. This is only likely the case where there is inflammation - hence why the ileum may be spared from cancer in these patients.
#### GGT1 and liver function tests
We have data on the outcomes of liver function tests i.e. bilirubin, Alkaline Phosphatase (ALP) and Alanine aminotransferase (ALT). Unfortunately we don't have GGT analysis although I am hoping that we can use these other biochemical measurements as markers of the extent of cholestasis.
#### Biochemical analysis between groups
We have some data for all groups (although some are missing data on liver function tests). Here I just plot the levels that we observe for each individual.
<img src="disease_differential_expression_files/figure-html/biochemistry-1.png" width="960" />
### Correlations with GGT1
I am interested in whether GGT1 expression is associated with bile acid transport - in this way the hypothesis is that it would be negatively coreelated with biochemical markers of cholestasis. Below I look at the correlation between GGT1 and biochemical markers in each tissue site.
#### Bilirubin
<img src="disease_differential_expression_files/figure-html/cor.bil.with.ggt1-1.png" width="1440" />
#### ALT
<img src="disease_differential_expression_files/figure-html/cor.alt.with.ggt1-1.png" width="1440" />
#### ALP
<img src="disease_differential_expression_files/figure-html/cor.alp.with.ggt1-1.png" width="1440" />
There isn't a clear relationship between GGT1 and biochemical markers of cholestasis, although fairly strong negative cor between GGT1 and bilirubin in the ileum. There are a few strong positive associations for ALT and ALP in the rectum. These are however quite dirven by an outlier. I think with such few samples it is very difficult to draw any conclusions from these data.
### CIBERSORT analysis
It was suggested that I run CIBERSORT in order to determine if there were any differences in cell populations between the disease states. This was particularly suggested by Beth to see if there was a reduction in Tregs in the Caecum in PSC/UC. I have used single cell expression data from the [human gut cell atlas](https://www.gutcellatlas.org) in order to generate a custom signatures matrix. This was then used for the deconvolution of cell fractions in our RNA-seq data. I ran this using the web application with the following parameters:
>Running CIBERSORTxFractions...
>[Options] mixture: files/nicholas.ilott@kennedy.ox.ac.uk/cpm_cyber.tsv
>[Options] sigmatrix: files/nicholas.ilott@kennedy.ox.ac.uk/CIBERSORTx_Job12_immune_cells_inferred_phenoclasses.CIBERSORTx_Job12_immune_cells_inferred_refsample.bm.K999.txt
>[Options] perm: 1000
>[Options] verbose: TRUE
>[Options] QN: FALSE
>[Options] outdir: files/nicholas.ilott@kennedy.ox.ac.uk/results/
>[Options] label: Job13
>=============CIBERSORTx Settings===============
>Mixture file: files/nicholas.ilott@kennedy.ox.ac.uk/cpm_cyber.tsv
>Signature matrix file: files/nicholas.ilott@kennedy.ox.ac.uk/CIBERSORTx_Job12_immune_cells_inferred_phenoclasses.CIBERSORTx_Job12_immune_cells_inferred_refsample.bm.K999.txt
>Number of permutations set to: 1000
>Enable verbose output
#### CIBERSORT in the ileum
<img src="disease_differential_expression_files/figure-html/ileum.ciber-1.png" width="2400" />
#### CIBERSORT in the caecum
<img src="disease_differential_expression_files/figure-html/caecum.ciber-1.png" width="2400" />
#### CIBERSORT in the rectum
<img src="disease_differential_expression_files/figure-html/rectum.ciber-1.png" width="2400" />
There is nothing major going on in terms of predicted cell frequencies. This is likely due to the complexity of the regulation as assessed by bulk RNA-seq.
### Visualisation in heatmap
It looks like the ComplexHeatmap package might come to the rescue and be able to get the numerous genes, clusters and conditions plotted in a single heatmap so that the data do not have to be displayed as boxplots. I explore below whether this is the case:
### Session Info
```